Optimizing the timing of an end-of-outbreak declaration: Ebola virus disease in the Democratic Republic of the Congo

Following the apparent final case in an Ebola virus disease (EVD) outbreak, the decision to declare the outbreak over must balance societal benefits of relaxing interventions against the risk of resurgence. Estimates of the end-of-outbreak probability (the probability that no future cases will occur) provide quantitative evidence that can inform the timing of an end-of-outbreak declaration. An existing modeling approach for estimating the end-of-outbreak probability requires comprehensive contact tracing data describing who infected whom to be available, but such data are often unavailable or incomplete during outbreaks. Here, we develop a Markov chain Monte Carlo–based approach that extends the previous method and does not require contact tracing data. Considering data from two EVD outbreaks in the Democratic Republic of the Congo, we find that data describing who infected whom are not required to resolve uncertainty about when to declare an outbreak over.


Notation
We use the following notation throughout the Supplementary Text: • () is the probability mass function of the offspring distribution (for  = 0,1,2, … ).
• () is the probability mass function of the discrete (daily or weekly) serial interval distribution (for  = 1,2, … days or weeks), and () is the corresponding cumulative distribution function.
•  is the time up to (and including) which data are available (at which the end-of-outbreak probability is to be estimated), where the symptom onset time of the first case is taken to be time 0, and all times are either in whole days or weeks (dependent on whether daily or weekly disease incidence data are available).•  is the number of recorded cases to date.
• The recorded cases (up to and including time ) are labelled with integer IDs  = 1,2, … , , ordered by symptom onset time.•   is the symptom onset time of case , and we write  = ( 1 ,  2 , … ,   ).
•   is the ID of the (recorded or estimated) infector of case , and we write  = ( 1 , … ,   ).
We define  1 = 0 to specify that the index case was an imported case, and we assume for simplicity that all other cases arose as a result of local transmission (note that when considering multiple imported cases in Fig. S4, we assumed that some of our results remain valid in this scenario, as described in the caption to that figure).•  , is the number of cases infected by case  who go on to develop symptoms at time 0 ≤  ≤  (under a recorded or estimated transmission tree).We write  to denote the ( × ( + 1)) matrix with entries  , , and   = ( ,0 , … ,  , ) for the sequence of case numbers generated by individual , up to (and including) time .•   is the number of recorded transmissions generated by individual  up to (and including) time , and we write  = ( 1 , … ,   ).•   is a random variable giving the total number of secondary cases ever generated by individual , including those occurring after the current time, , and we write  = ( 1 , … ,   ).
Disease incidence data are characterised by the pair (, ), where the dependence on  is included to specify explicitly that exactly  cases have occurred up to time .The transmission tree is characterised by the vector of infectors, .

Derivation of end-of-outbreak probability given the transmission tree
Here, we derive the expression for the end-of-outbreak probability given both disease incidence data and the outbreak transmission tree (recorded or estimated), up to and including the current time,  (i.e., the end-of-outbreak probability for the traced methodequation (1) in the main text).
(S1) Here, the second and fourth equalities follow because the probability of each recorded case generating future cases depends only on their symptom onset time and information on the transmissions they have generated to date, and the third equality follows by the assumption that transmissions occur according to a branching process.Now, for integer  ≥ 0, we have: (S2) where ( −   )  , gives the probability that  , specified secondary cases generated by  develop symptoms at time  (defining this term to equal 1 when both ( −   ) and  , are zero), (1 − ( −   ))  gives the probability that  specified secondary cases develop symptoms after the current time, , and the multinomial coefficient, ) gives the number of ways in which the (  + ) total secondary cases can be divided into cases developing symptoms on each time up to time  and cases developing symptoms after time .Therefore, ) (  ), (S4) and Finally, for a negative binomial offspring distribution, (S8)

Derivation of likelihood of transmission tree given disease incidence data
Here, we derive the likelihood, Prob(  | ,  ), of the transmission tree characterised by  given disease incidence data characterised by (, ), which is used in the MCMC and enumerate methods for calculating the end-of-outbreak probability.
We first consider the probability of transmission frequency matrix  given the disease incidence data (where multiple distinct transmission trees may give rise to the same , as quantified later): (S15) In the MCMC method for calculating the end-of-outbreak probability, the exact constant of proportionality in equation ( S15) is not required to sample possible transmission trees from the likelihood, Prob(  | ,  ).On the other hand, when we used the enumerate method to calculate the end-of-outbreak probability, we used equation (S15) to calculate the relative likelihoods of all possible transmission trees, and then normalised these likelihood values to ensure that they sum to one.).Note that in our analyses, we shifted the symptom date of one case (ID 4) to a later date than the date reported in ( 17) to avoid a zero-day serial interval (24 April to 1 May).This later symptom onset date is consistent with epidemiological investigations undertaken at the time that suggest that individual ID 4 developed symptoms in May.The end-of-outbreak probability conditional on the current augmented transmission tree is plotted for each of the one in 1,000 MCMC iterations retained after thinning (note that the xaxis numbering includes iterations removed after thinning).The initial burn-in period of 2,000,000 iterations is indicated by the vertical black dashed line.Estimated effective sample sizes for the chains shown in panels A-C (each from a sample size of 8,000 iterations retained after burn-in and thinning) are 7,600, 4,300 and 3,900, respectively (to two significant figures).D-F.Corresponding trace plots of the log-likelihood.

Fig. S5 .
Fig. S5.Transmission tree for 2017 Likati EVD outbreak.Dates are in DD/MM format (all 2017).Note that in our analyses, we shifted the symptom date of one case (ID 4) to a later date than the date reported in (17) to avoid a zero-day serial interval (24 April to 1 May).This later symptom onset date is consistent with epidemiological investigations undertaken at the time that suggest that individual ID 4 developed symptoms in May.

Fig. S6 .
Fig. S6.Example MCMC trace plots for the 2017 Likati health zone EVD outbreak.A-C.Trace plots for calculations of the end-of-outbreak probability using the MCMC method with data from up to day 30 (A), day 50 (B) and day 70 (C) of the outbreak.The end-of-outbreak probability conditional on the current augmented transmission tree is plotted for each of the one in 1,000 MCMC iterations retained after thinning (note that the xaxis numbering includes iterations removed after thinning).The initial burn-in period of 2,000,000 iterations is indicated by the vertical black dashed line.Estimated effective sample sizes for the chains shown in panels A-C (each from a sample size of 8,000 iterations retained after burn-in and thinning) are 7,900, 8,300 and 7,900, respectively (to two significant figures).D-F.Corresponding trace plots of the log-likelihood.

Fig. S7 .
Fig. S7.Example MCMC trace plots for the 2020 Équateur province EVD outbreak.A-C.Trace plots for calculations of the end-of-outbreak probability using the MCMC method with data from up to day 25 (A), day 130 (B) and day 150 (C) of the outbreak.The end-of-outbreak probability conditional on the current augmented transmission tree is plotted for each of the one in 1,000 MCMC iterations retained after thinning (note that the xaxis numbering includes iterations removed after thinning).The initial burn-in period of 2,000,000 iterations is indicated by the vertical black dashed line.Estimated effective sample sizes for the chains shown in panels A-C (each from a sample size of 8,000 iterations retained after burn-in and thinning) are 7,600, 4,300 and 3,900, respectively (to two significant figures).D-F.Corresponding trace plots of the log-likelihood.